Revert "[Hardware] Replace `torch.cuda.empty_cache` with `torch.accelerator.empty_cache`" (#30681) by zhewenl · Pull Request #36076 · vllm-project/vllm

zhewenl · 2026-03-05T02:05:50Z

Revert of #30681

This reverts the merge commit for PR #30681 which replaced torch.cuda.empty_cache with torch.accelerator.empty_cache across the codebase.

Reason

This PR is linked to 1 new CI failure in nightly build #54530:

Distributed Tests (4 GPUs) — test_torchrun_example_moe.py fails with KV cache memory error: available memory 0.49 GiB < needed 0.50 GiB. The replacement of torch.cuda.empty_cache with torch.accelerator.empty_cache may affect GPU memory reclamation behavior, causing this marginal shortfall.

Auto-generated

This revert PR was auto-generated by the CI failure analyzer. Please review before merging.

…erator.empty_cache` (vllm-project#30681)" This reverts commit 16d2ad1.

mergify · 2026-03-05T02:06:32Z

Documentation preview: https://vllm--36076.org.readthedocs.build/en/36076/

gemini-code-assist

Code Review

This pull request reverts a previous change that replaced torch.cuda.empty_cache with torch.accelerator.empty_cache, which caused CI failures. The revert is mostly mechanical, but in some platform-agnostic files, it correctly uses a platform abstraction (current_platform.empty_cache()) instead of hardcoding torch.cuda.empty_cache. This is a good improvement. However, I've identified a critical issue in vllm/v1/worker/xpu_model_runner.py where a monkey-patch is not reverted, potentially leading to side effects.

gemini-code-assist · 2026-03-05T02:08:20Z

vllm/v1/worker/xpu_model_runner.py

        if supports_xpu_graph():
            torch.cuda.graph = torch.xpu.graph
            torch.cuda.CUDAGraph = torch.xpu.XPUGraph
+            torch.cuda.empty_cache = torch.xpu.empty_cache


This monkey-patch is not reverted in the finally block, making it permanent for the process. This can cause unexpected behavior if other parts of the code expect the original torch.cuda.empty_cache. The same issue exists for torch.cuda.graph and torch.cuda.CUDAGraph from the original code. A context manager should restore the original state upon exit. Please save the original attributes before patching and restore them in the finally block.

mergify · 2026-03-05T10:48:13Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zhewenl.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

jikunshang · 2026-03-09T14:23:38Z

#30681 run all test. Distributed Tests (4 GPUs) passed see https://buildkite.com/vllm/ci/builds/54293#019cb62a-38d7-4e77-a962-89d3fb0de589
~~while strange thing is this is running ibm-research/PowerMoE-3b instead of microsoft/Phi-mini-MoE-instruct~~
oh sorry, i didn't check full log. it shows that only have some chat template error.

my pr log:

[2026-03-04T00:25:02Z] INFO 03-04 00:25:02 [decorators.py:588] saved AOT compiled function to /root/.cache/vllm/torch_compile_cache/torch_aot_compile/0da436ac5f91ca7287450564ca8ac58a52973ee129f411ac065de87300f1e07d/rank_0_0/model
[2026-03-04T00:25:02Z] INFO 03-04 00:25:02 [gpu_worker.py:424] Available KV cache memory: 4.87 GiB
[2026-03-04T00:25:02Z] INFO 03-04 00:25:02 [kv_cache_utils.py:1314] GPU KV cache size: 21,888 tokens

Revert "[Hardware] Replace torch.cuda.empty_cache with `torch.accel…

cf453d6

…erator.empty_cache` (vllm-project#30681)" This reverts commit 16d2ad1.

mergify bot added documentation Improvements or additions to documentation performance Performance-related issues nvidia structured-output v1 labels Mar 5, 2026

github-project-automation bot added this to NVIDIA and Structured Output Mar 5, 2026

mergify bot assigned sangstar Mar 5, 2026

gemini-code-assist bot reviewed Mar 5, 2026

View reviewed changes

mergify bot added the needs-rebase label Mar 5, 2026

mergify bot added the intel-gpu Related to Intel GPU label Mar 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revert "[Hardware] Replace `torch.cuda.empty_cache` with `torch.accelerator.empty_cache`" (#30681)#36076

Revert "[Hardware] Replace `torch.cuda.empty_cache` with `torch.accelerator.empty_cache`" (#30681)#36076
zhewenl wants to merge 1 commit intovllm-project:mainfrom
zhewenl:auto-revert/pr-30681

zhewenl commented Mar 5, 2026

Uh oh!

mergify bot commented Mar 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 5, 2026

Uh oh!

mergify bot commented Mar 5, 2026

Uh oh!

jikunshang commented Mar 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

zhewenl commented Mar 5, 2026

Revert of #30681

Reason

Auto-generated

Uh oh!

mergify bot commented Mar 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Mar 5, 2026

Uh oh!

jikunshang commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jikunshang commented Mar 9, 2026 •

edited

Loading